Design of efficient classifier integration and performance evaluation in machine learning
نویسنده
چکیده
Characteristics of any classifier heavily depend upon the nature of data set taken for training and verification. Area of app lications like health care suffered from having the large and suitable dataset. Classifier designed for health care should show a better generalization and robustness characteristics so that end results presented by classifier can consider with high reliability and confidence. In this paper consistency problem associated with classifier has presented, which is a big issue from practical point of view. Defining committee of experts is one of natural way to increase the reliability in classifier design but at the same time, way of integration rules the end performance. To overcome problem of generalization and consistency of classifier, two methods for developing the mixture of classifier namely TMQD and MVFD are presented. Estimation of quality associated with a classifier is very challenging task for researcher, because there is no single parameter which could alone represents the absolute performance .To measure the quality of classifier rather than having the conventional parameters like sensitivity and specificity, receiver operating characteristics is always a better choice. But in practical environment of health care use of ROC hardly has seen. In this paper detail understanding of ROC and estimation of area under curve has also presented. Selection of threshold value is one of the most important factor to determine the performance of classifier. Dependency of threshold value with population and geographical area making difficult to decide a optimal value. A graphical approach has presented to select the best threshold value as according to environment and need. Index Terms – Data Mining, Classifier, Classifier integration, ROC, Area under ROC, Sensitivity, Specifity, Heart Diseases, Neural Networks, .
منابع مشابه
Fault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملVerification of unemployment benefits’ claims using Classifier Combination method
Unemployment insurance is one of the most popular insurance types in the modern world. The Social Security Organization is responsible for checking the unemployment benefits of individuals supported by unemployment insurance. Hand-crafted evaluation of unemployment claims requires a big deal of time and money. Data mining and machine learning as two efficient tools for data analysis can assist ...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کامل